Search CORE

97 research outputs found

Optimizing the CVaR via Sampling

Author: Glassner Yonatan
Mannor Shie
Tamar Aviv
Publication venue
Publication date: 22/11/2014
Field of study

Conditional Value at Risk (CVaR) is a prominent risk measure that is being used extensively in various domains. We develop a new formula for the gradient of the CVaR in the form of a conditional expectation. Based on this formula, we propose a novel sampling-based estimator for the CVaR gradient, in the spirit of the likelihood-ratio method. We analyze the bias of the estimator, and prove the convergence of a corresponding stochastic gradient descent algorithm to a local CVaR optimum. Our method allows to consider CVaR optimization in new domains. As an example, we consider a reinforcement learning application, and learn a risk-sensitive controller for the game of Tetris.Comment: To appear in AAAI 201

arXiv.org e-Print Archive

CiteSeerX

Association for the Advancement of Artificial Intelligence: AAAI Publications

Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis

Author: Hallak Assaf
Mannor Shie
Munos Remi
Tamar Aviv
Publication venue
Publication date: 27/11/2015
Field of study

We consider the off-policy evaluation problem in Markov decision processes with function approximation. We propose a generalization of the recently introduced \emph{emphatic temporal differences} (ETD) algorithm \citep{SuttonMW15}, which encompasses the original ETD(

\lambda

), as well as several other off-policy evaluation algorithms as special cases. We call this framework \ETD, where our introduced parameter

\beta

controls the decay rate of an importance-sampling term. We study conditions under which the projected fixed-point equation underlying \ETD\ involves a contraction operator, allowing us to present the first asymptotic error bounds (bias) for \ETD. Our results show that the original ETD algorithm always involves a contraction operator, and its bias is bounded. Moreover, by controlling

\beta

, our proposed generalization allows trading-off bias for variance reduction, thereby achieving a lower total error.Comment: arXiv admin note: text overlap with arXiv:1508.0341

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications